Regularized estimation of large covariance matrices

نویسندگان

PETER J. BICKEL

ELIZAVETA LEVINA

چکیده

This paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as (logp)/n→ 0, and obtain explicit rates. The results are uniform over some fairly natural well-conditioned families of covariance matrices. We also introduce an analogue of the Gaussian white noise model and show that if the population covariance is embeddable in that model and well-conditioned, then the banded approximations produce consistent estimates of the eigenvalues and associated eigenvectors of the covariance matrix. The results can be extended to smooth versions of banding and to non-Gaussian distributions with sufficiently short tails. A resampling approach is proposed for choosing the banding parameter in practice. This approach is illustrated numerically on both simulated and real data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regularized Estimation of High-dimensional Covariance Matrices

متن کامل

CONDITION NUMBER REGULARIZED COVARIANCE ESTIMATION By Joong - Ho Won

Estimation of high-dimensional covariance matrices is known to be a difficult problem, has many applications, and is of current interest to the larger statistics community. In many applications including so-called the “large p small n” setting, the estimate of the covariance matrix is required to be not only invertible, but also well-conditioned. Although many regularization schemes attempt to ...

متن کامل

Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values

Estimating the mean and the covariance matrix of an incomplete dataset and filling in missing values with imputed values is generally a nonlinear problem, which must be solved iteratively. The expectation maximization (EM) algorithm for Gaussian data, an iterative method both for the estimation of mean values and covariance matrices from incomplete datasets and for the imputation of missing val...

متن کامل

Penalized model-based clustering with unconstrained covariance matrices.

Clustering is one of the most useful tools for high-dimensional analysis, e.g., for microarray data. It becomes challenging in presence of a large number of noise variables, which may mask underlying clustering structures. Therefore, noise removal through variable selection is necessary. One effective way is regularization for simultaneous parameter estimation and variable selection in model-ba...

متن کامل

A Clt for Regularized Sample Covariance Matrices

We consider the spectral properties of a class of regularized estimators of (large) empirical covariance matrices corresponding to stationary (but not necessarily Gaussian) sequences, obtained by banding. We prove a law of large numbers (similar to that proved in the Gaussian case by Bickel and Levina), which implies that the spectrum of a banded empirical covariance matrix is an efficient esti...

متن کامل